Analyzing the Impact of Spelling Errors on POS-Tagging and Chunking in Learner English
نویسندگان
چکیده
Part-of-speech (POS) tagging and chunking have been used in tasks targeting learner English; however, to the best our knowledge, few studies have evaluated their performance and no studies have revealed the causes of POStagging/chunking errors in detail. Therefore, we investigate performance and analyze the causes of failure. We focus on spelling errors that occur frequently in learner English. We demonstrate that spelling errors reduced POS-tagging performance by 0.23% owing to spelling errors, and that a spell checker is not necessary for POS-tagging/chunking of learner English.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملJoint English Spelling Error Correction and POS Tagging for Language Learners Writing
We propose an approach to correcting spelling errors and assigning part-of-speech (POS) tags simultaneously for sentences written by learners of English as a second language (ESL). In ESL writing, there are several types of errors such as preposition, determiner, verb, noun, and spelling errors. Spelling errors often interfere with POS tagging and syntactic parsing, which makes other error dete...
متن کاملA Fast Boosting-based Learner for Feature-Rich Tagging and Chunking
Combination of features contributes to a significant improvement in accuracy on tasks such as part-of-speech (POS) tagging and text chunking, compared with using atomic features. However, selecting combination of features on learning with large-scale and feature-rich training data requires long training time. We propose a fast boosting-based algorithm for learning rules represented by combinati...
متن کاملRecognizing Noisy Romanized Japanese Words in Learner English
This paper describes a method for recognizing romanized Japanese words in learner English. They become noise and problematic in a variety of tasks including Part-Of-Speech tagging, spell checking, and error detection because they are mostly unknown words. A problem one encounters when recognizing romanized Japanese words in learner English is that the spelling rules of romanized Japanese words ...
متن کاملRule Extraction from A Trained Conditional Random Field Model
Conditional Random Field (CRF) has proven to be highly successful for sequence labeling problems like part of speech tagging, segmentation etc. However, the model acts like a black box, providing no insight into what is learned. We propose a system for rule extraction from CRF to assist comprehensibility of the model. Experiments on POS tagging and chunking problem in English are performed as c...
متن کامل